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Abstract: Effective fault classification of rolling element bearings provides an important 
basis for ensuring safe operation of rotating machinery. In this paper, a novel vibration 
sensor-based fault diagnosis method using an EUipsoid-ARTMAP network (EAM) and a 
differential evolution (DE) algorithm is proposed. The original features are firstly extracted 
from vibration signals based on wavelet packet decomposition. Then, a minimum-redundancy 
maximum-relevancy algorithm is introduced to select the most prominent features so as to 
decrease feature dimensions. Finally, a DE-based EAM (DE-EAM) classifier is constructed 
to realize the fault diagnosis. The major characteristic of EAM is that the sample 
distribution of each category is realized by using a hyper-ellipsoid node and smoothing 
operation algorithm. Therefore, it can depict the decision boundary of disperse samples 
accurately and effectively avoid over-fitting phenomena. To optimize EAM network 
parameters, the DE algorithm is presented and two objectives, including both classification 
accuracy and nodes number, are simultaneously introduced as the fitness functions. 
Meanwhile, an exponential criterion is proposed to realize final selection of the optimal 
parameters. To prove the effectiveness of the proposed method, the vibration signals of 
four types of rolling element bearings under different loads were collected. Moreover, to 
improve the robustness of the classifier evaluation, a two-fold cross validation scheme is 
adopted and the order of feature samples is randomly arranged ten times within each fold. 
The results show that DE-EAM classifier can recognize the fault categories of the rolling 
element bearings reliably and accurately. 
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1. Introduction 

Fault diagnosis, which includes fault detection and isolation (FDI) [1,2], fault tolerant control 
(FTC) [3,4] and fault classification [5,6], plays an important role in automation systems, process 
engineering and mechanical equipment. In these research fields, fault recognition of rolling element 
bearings has attracted more and more attention of many researchers. As one of the most critical 
mechanical components, it has found wide applications in rotating machinery such as motors, pumps, 
compressors and wind power generators, etc. However, its defects may cause malfunctions or even 
lead to catastrophic failure of the rotating machinery if not diagnosed in time. The available studies 
have shown that bearing faults account for 40% of all machinery failures [7,8]. Therefore, effective 
fault monitoring is necessary for reducing maintenance costs and avoiding unscheduled downtime. 
Different from some diagnosing methods used in control systems and industrial processes, data driven 
models are commonly adopted to realize condition monitoring and fault diagnosis of rolling element 
bearings [5,6,9]. In comparison with the model driven method used in [1-4], the advantage of 
data-based modeling lies that the model can be built without understanding the complex internal 
mechanics of the mechanical system. For rolling element bearings, an accurate analytic dynamic model 
to depict the characteristic of rolling element bearings is hard to construct because there are so many 
sub-parts which include inner race, outer race, elements, etc. Therefore, data driven models based on 
historic data is preferred in real industrial applications and its robustness mainly depends on effective 
signal preprocessing, feature extraction and statistical modeling strategies. 

In the past two decades, various kinds of pattern recognition techniques such as artificial neural 
networks (ANNs) [10-12] and support vector machines (SVMs) [13-15] have been successfully 
applied to classifying bearing faults. These methods have been proven to be capable of obtaining 
favorable classification accuracy in some research areas [5]. However, in real applications, the useful 
information is usually contaminated by external noise or internal load variation. In such cases, the 
classifiers based on the above methods are liable to cause over-fitting phenomena and deteriorate the 
classification performance [6]. At the same time, these methods seek to depict the boundary of the 
classifier as the function of several support vectors or network weight values, which makes it hard to 
characterize the complex decision boundary if the spatial distributions of the training samples is 
disperse and irregular. 

In this paper, EUipsoid-ARTMAP (EAM), which is based on adaptive resonance theory (ART) [16], 
is presented to realize the task of classifying complex samples due to its strong local and distributed 
representation ability [17-19]. Unlike Radius Basis Function (RBF) neural network, which is utilized 
to estimate the probability density of the data streams and judge the occurrence of the fault by 
comparing the probability density of the current data with the previous one [20], EAM is mainly proposed 
to depict the spatial distribution of the sample data for each category. One major characteristic of EAM 
is that distributed hyper-ellipsoid clusters are utilized to realize the geometric representation of each 
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category. Therefore, the decision boundary of the dispersive samples can be depicted accurately and 
flexibly. Another is that smoothing operation is performed during the updating process of these 
ellipsoid nodes, which makes it insensitive to noise and can avoid over-fitting. Xu et al. [21] applied 
the EAM network to classify the gene expression data generated by DNA microarray experiments. 
They claimed that a good performance can be achieved. Anagnostopoulos et al. [H,12)\ utilized EAM 
for the Circle-in-a-Square classification problem. The authors proved that EAM demonstrated better 
ability in clustering and classification tasks than other ARTMAP networks. However, it can be also 
demonstrated that the performance of EAM network is greatly affected by its parameter selection 
strategy. In fact, there are four important parameters that need to be determined a priori: vigilance 
parameter, learning rate, effective diameter and ratio of minor-to-major axes lengths. Each parameter 
exerts a significant impact on the performance of the EAM classifier. Currently, the selection of these 
parameters mainly depends on empirical selection or trial and error methods, which inevitably bring 
inaccuracy and uncertainty to the classification performance. To solve this problem, a differential 
evolution (DE) algorithm is adopted in this paper to optimize the network parameters of EAM. One 
reason for selecting DE is that it has strong global optimization ability and imposes no demands for the 
analytical mathematical expression of the fitness function, which is paramount for EAM parameter 
optimization because such an equation does not exist during the classification process. In addition, the 
DE has several advantages: compared with traditional evolutionary optimization algorithms, its fewer 
control parameters make DE simpler and more straightforward to implement. Meanwhile, the 
differential operation of the candidate parameter vectors optimization makes it more effective to find a 
global optimum when handling large scale and multi-objective optimization problems [24]. Up to now, 
DE algorithm has been used in many applications such as system design [25], partitioned clustering [26] 
and classification tasks [27]. 

Based on the DE optimization algorithm and EAM classifier, a vibration sensor-based fault 
diagnosis system is constructed. In this framework, vibration signals were collected and wavelet packet 
decomposition (WPD) is introduced to extract features to characterize the fault related information. 
Meanwhile, the minimum-redundancy maximum-relevancy (mRMR) feature selection method is 
employed to decrease the redundancy and irrelevancy of these features. Finally, the EAM classifier is 
constructed based on the selected salient features to realize fault classification of rolling element 
bearings and DE is integrated with EAM to optimize the classifier parameters during the training 
process. In order to get satisfactory accuracy without lowering the recognition speed, both the 
classification accuracy and number of nodes are adopted as the fitness functions of the DE algorithm. 
Meanwhile, a new exponential selection criterion is presented to balance the influence of both 
functions and choose the final parameters. To verify the effectiveness of the DE-EAM classifier, four 
kinds of faulty vibration signals (normal, inner race fault, outer race fault and ball fault) were collected 
from a test rig. Meanwhile, two-fold cross validation and random order strategies are adopted to 
evaluate the performance of classifier robustly and accurately. The final results of the bearing faults 
diagnosis show that DE-EAM method can accurately recognize the fault categories. 

The rest of the paper is organized as follows: in Section 2, the principles of the EAM classifier and 
DE optimization algorithm are explained, respectively. In Section 3, the framework of the DE-EAM- based 
monitoring system is presented. Moreover, the principle of WPD feature extraction and mRMR feature 
selection are discussed in detail. In Section 4, the effectiveness of the proposed method is verified by 
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classification of four kinds of bearing faults under a cross validation and random order strategy. Some 
useful conclusions are drawn in Section 5. 

2. Principle of the DE-EAM Algorithm 

2.1. Ellipsoid-ARTMAP (EAM) 
2.1.1. Network Structure 

Figure 1 shows the framework of EAM network. It can be seen that a typical EAM structure 
consists of three layers: input layer Fi, representation layer F2 and mapping layer F3. Fi and F2 layer 
are connected by the template vector Wj , which is utilized to encode the input sample into the jth node 
in F2 layer. If the node passed the vigilance test, the commitment test is then applied on these nodes 
and the winner nodes are mapped into the F3 layer so as to get the classification results. During the 
training process, a match tracking (MT) process [16] will be invoked when the output of F3 layer 
is incorrect. 



Figure 1. The framework of the Ellipsoid-ARTMAP network. 




2.1.2. Node Representation 

Figure 2 shows a two-dimensional representation of an EAM node. Each node j is described by a 
template vector w ■ = [m.,d-,R^] where nij is the center of hyper-ellipsoid, dj is the direction vector 

of node, which coincides with the direction of the hyper-ellipsoid's major axis, and is called the 
radius of the node, which equals half the length of the major axis. 

Figure 2. Two-dimensional representation of an EAM node. 
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2.1.3. Distance Calculation 

The distance between an input sample x and the node j is given by [16]: 

J/*(x|wj) = max|||x-mj||^ (1) 

where is the distance between a sample x and the center nij of the node j, which can be 

expressed as [22]: 



x-mJ = <^ 



-J|x-mJr-(l-//^)[j/(x-mp]' if dj^O 

^ II II 

x-mJ if di = 0 

II J||2 ■' I 



where || -ll^ means the usual Euclidian (L2) norm, is the ratio of minor-to-major axes lengths and 

is node shape matrix. As shown in Figure 2, the shaded areas denote the representation regions of the 
node j, which are the set of points that satisfy the following condition: 

J^(x|wj) = 0 => ||x-mjL^ </?j (3) 

2.1.4. Vigilance Test (VT) and Commitment Test (CT) 

For an input sample x , the node j features two important values which are called the category 
match function (CMF) value yC>*(Wj |x) and the category choice function (CCF) value r*(Wj|x) [16]: 

i?j -l-max<^ x-m, ,R. 
/?'(wJx) = l ^ ^ (4) 

-max<^ x-mJ ,R. 

r(wJx) = ^ — ^ (5) 

D,„-2*R.+a 

where is called effective diameter, a is the choice parameter. The choice parameter a is a really 
small positive value which has no obvious influence on EAM performance. In this paper, a is selected 
as 0.001. Usually, uncommitted nodes have a constant CCF value = D^J(2D^w + a) . The 
parameter (o is usually chosen as ft) > 0.5 to ensure the stability of EAM. If the CMF value and CCF 
value of the node j for the given input sample x are larger than initial vigilance parameter P and 7^, , 
this node is called a committed node. These two comparison processes are called the vigilance test (VT) 
and commitment test (CT), respectively. 

2.1.5. Node Updating 



During the training process, the committed node is updated according to the input sample. That is, 
its template vector Wj is recalculated according to the formula below [23]: 
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(6) 



(7) 



"■(2) 



" llx -"mT 
11(2) "'jIIj 

where f g (0,1] denotes the learning rate. If Y is selected as 1, it means that the EAM has the ability of 
fast learning, x^j, represents the second sample encoded by the node / 

A two-dimensional description of node updating based on Equations (6)-(8) is shown in Figure 3. It 
can be shown that the EAM node can only grow in size and never be destroyed during the training 
process. The node's new representation region is a minimum hyper-ellipsoid which contains both the 
old region and the new sample to be encoded. In addition, when the EAM node's direction vector is set, 
it will not change during the node updating process. Besides that, due to the minimum 
hyper-ellipsoid is used to characteristic the old region and the new pattern, the E"'^'''' and can only 
touch at one point. 



Figure 3. The updating process of a two-dimensional EAM node. 



t(x,w'; 




2.1.6. Training and Classification 



The complete training process of the EAM classifier is given in the following steps: 
Step 1: The network parameters are initialized. 

Step 2: The distance calculation is started based on Equation (3) to judge whether the training 
sample belongs to the representation region of the node j or not. If not, the input sample 
will undergo the VT. That is to say, the value of p*(Wj|x) larger than p will be selected 

into a candidate set S, otherwise, it will be removed. 
Step 3: CCF value of all member nodes in S is calculated and compared with r„. The nodes that 
cannot pass the CT will be removed from S. If no nodes pass the VT and CT (here, S is 
empty), new node will be created and initiated. 
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Step 4: If the set S is non-empty after the VT and CT, the labels of the training sample and the 
chosen node j will be used for comparison. If the selected node j shares the same label 
with the input sample, the template vector Wj is updated. If the label is not correct, the 

match tracking (MT) process will be activated. The flowchart of the EAM training is 
given in Figure 4. 

Figure 4. Flowchart of EAM training. 
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The EAM classification is similar to the training process. The only difference is that the MT process 
is neglected if the EAM is used for classification. If all nodes are not committed, the output of the 
EAM classifier is -1, which denotes that an abnormal result appears. 

As mentioned above, the following four parameters have an important influence on the performance 
of the EAM classifier: the effective diameter D^, the ratio of minor-to-major axes lengths ^e(0,l], 
the vigilance parameter p e [o,l] and the learning rate y e (0,l] . Vigilance parameter jO is a threshold 
value which is a necessary condition for judging whether the input belongs to the corresponding node 
or not. Learning rate Y assure that EAM has a flexible learning ability range from slow learning to fast 
learning. Effective diameter is always chosen to ensure the stability of EAM. Ratio of minor-to-major 
axes lengths n determines the shape of the hyper-ellipsoid node. To improve the accuracy of the 
classifier, differential evolution (DE) algorithm is introduced to obtain the optimal combination of 
these parameters. The principle of DE is depicted in the following section. 

2.2. Differential Evolution (DE) 

As a member of the evolutionary optimization algorithm class, the DE algorithm uses three classical 
operators: mutation, crossover and selection, to generate trial parameter vectors and select the final 
solution with best fitness [28]. The main advantage of evolutionary algorithms is their global search 
abilities with no risk of falling into the local minimum region. In contrast, the gradient-based local 
research algorithm is not suitable for optimizing the EAM network structure [29,30] because the 
analytical mathematical expression of the objective function is not available. Moreover, although there 
are many variants, the traditional DE/rand/l/bin algorithm is still adopted in this paper because of its 
maturity, robustness and fast convergence features [31-33]. 

Each generation in DE comprises Np populations. The population is expressed as Y,,g . (h---,N^) 

is the index of population and g represents the number of generations. Each population contains 
parameters that need to be optimized. Here in this paper, four parameters of the EAM network need to 
be optimized, that is to say, = {D^,ju,p,/} . The main steps of DE algorithm are given as follows: 

Step 1: Initialization 

Set the boundaries of the optimized parameters and initialize their value in the first generation 
according to a uniform probability distribution. 
Step 2: Mutation 

Mutation operation is to add a vector differential to a population vector of individuals. For each 
vector Y.,^ , the mutant vector is formed based on the following rule: 

V,„.i=Y,„+F(Y^,„-Y,3,^) (9) 

where, Y^^^ ^ Y^j.g and Y^j^ are selected from the populations randomly. The indices rl , r2 and r3 
are different from each other and also differ with the current individual i ii.e., r\^r2^r?)^ i) . 
F e [0,l] is a scaling factor which controls the amplitude of the difference vector (Y^j.g ~ Y^3^ ) . 
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Step 3: Crossover 

After the mutation operation, DE utilizes crossover operation to build the trial vectors. Specially, 
the elements from the parent vector will be combined with ,^ so as to produce the new 
element , which is defined as: 



U 



if {rand J < C) or (r = j) 



Y^,^ otherwise 



(10) 



where, 7 = 1,2,... V, rand, e [0,1] is a random number, Ce[0,l] is a predefined crossover parameter, 
T e (1,1,.. .,V) is a randomly selected index, r = 7 enables at least one of the parameters in the offspring 
to be different from their parent. 
Step 4: Selection 

Selection operation is utilized to choose the better offspring based on the follow equation: 



U,,..i if /(U,.,,,)</(Y,J 



Y; ^ otherwise 



(11) 



where, /(*) denotes the fitness function. It can be seen that the solution vector with better fitness 
value is chosen as the new generation. Obviously, the selection process guarantees that the population 
fitness is either improving or at least maintaining the best values so far. 
Step 5: Termination 

The process of the mutation, crossover and selection is repeated until the maximum generation 
number ^ max is satisfied. The selection of g,^ depends on the trend of the fitness value fluctuation. 
Generally, the fitness function tends to be stable with the proceeding of the iteration process. The 
flowchart of DE is given in Figure 5. 



Figure 5. Flowchart of DE iterative optimization. 
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3. Diagnosis System Based on DE-EAM 

3.1. Framework 

The framework of the DE-EAM based fault diagnosis system is shown in Figure 6. It can be seen 
that the whole system is composed of four parts: (1) data collection; (2) feature extraction; (3) feature 
selection; (4) fault classification. Vibration signals from different bearing fault types are collected and 
Wavelet Packet Decomposition (WPD) is adopted to extract the characteristic features. Then, mRMR 
method is used to decrease the redundancy and irrelevancy of the original features. Finally, the EAM 
classifier is constructed to perform bearing fault classification and DE algorithm is used to optimize 
the network parameters of the EAM classification. The detailed description of each part is given in the 
sections that follow. 



Figure 6. Architecture of bearing fault diagnosis system. 
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3.2. Data Collection 



Vibration analysis, due to its simplicity and effectiveness, has been wildly used in bearing fault 
diagnosis [34-36]. Konar and Chattopadhyay [14] have claimed that the vibration response is the most 
reliable technique to detect and diagnose the localized defects. In this paper, vibration signals were 
collected and used to depict the dynamic characteristic of rolling element bearings. As well known, the 
vibration signal contains both the low-frequency impact and high-frequency structural information. 
Therefore, the WPD algorithm is adopted in the following section to extract the complete characteristics 
of the rolling element bearings. 

3.3. Feature Extraction 

WPD is a kind of multi-resolution signal processing algorithm. The main characteristic of WPD is 
that both the low-frequency and high-frequency information are decomposed simultaneously. The 
decomposition structure of WPD is shown in Figure 7. In this paper, the input data d denotes the 
vibration sensory signal collected and its length is N. The corresponding wavelet coefficients of d in 
the 7th level for WPD can be written as [37]: 

d" (k) = ^ (m - 2k)d"'_^ (m) {n is even) . 

m 

rf"(/:) = ^/j,(m-2/:)j]"['"^(m) {n is odd) ..^s 
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where, n is the index of sub-bands, ho and hj are a. pair of orthogonal filters for decomposition, ho is 
low pass filter and hi is a high pass filter. By decomposing the signal into different frequency band, the 
influence of noisy disturbance will be weakened obviously. Therefore, component of useful signal at 
each level can be enhanced correspondingly, which improves the robustness of the extracted features. 
Here, the sub-band energy E" at jth level are then selected as initial features whose mathematical 
expression can be calculated as: 



(14) 



As described above, when the decomposed level of WPD is selected to be u, the number of the 
feature vectors is 2". The energy feature in each sub-band is calculated according to Equation (12) 



and the feature set F is generated correspondingly as F = iE^,E^,...,E^ 



Figure 7. High and low frequency decomposition of vibration signals based on WPD. 
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3.4. Feature Selection 

Irrelevant and redundant features negatively affect the classification performance of the classifier. In 
this study, a sequential forward selection algorithm named the minimum-redundancy maximum-relevance 
(mRMR) [38] is employed to search for the optimal feature combination from the original features. 
The mRMR algorithm utilizes mutual information (MI) to select those optimal features which can best 
fulfill the minimal redundancy (Min-Redundancy) and maximal relevance (Max-Relevance) criterion. 

Within the mRMR algorithm, MI is to estimate the correlation between two variables so as to 
quantify both the relevance and redundancy. The mathematical expression of MI is given as: 

pi.x, y) 



I{x,y) = ^^ p{x,y)\og- 



-dxdy 



(15) 



p(x)p(y) 

where x and y are two vectors, p(x, y) is their joint probabilistic density, both p{x) and p{y) denote 
the marginal probabilistic densities. 

F is defined as the original feature set which concludes h features and class label c. Let Fa denote 
the already selected feature set with m features, Fh denotes the candidate feature set with n features, 
m plus n is equal to h. Initially, the relevance of the features in Fb set is calculated by: 

D = Iif,c) (16) 
Then, the redundancy R of the feature/in Fb with the features fj in Fa can be calculated by [39]: 
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R = -T.^(f,ff) (j = l...,m) 



(17) 



Finally, the maximum relevance and minimum redundancy feature in Fi, are selected based on the 
following rule [39]: 



max 



I{f,c)-- ^I(f,f^) 



(18) 



Based on the mRMR criteria, the selected feature is removed from Fb and put into Fa. The new 
calculation process starts over based on the updated Fb and Fa. Initially, Fa is an empty set {i.e.,m - 0) , 
Fb contains h features (i.e.,n-h). The relevance of the features in Fb set is calculated based on 
Equation (14) and the feature corresponding to the maximum relevance is selected as the first feature 
and put into Fa set. This feature evaluation process will continue h-l rounds and a rearranged features 
set F' based on mRMR algorithm is obtained as F'= {/, ,/2,.../i....,/,, }. The first k features are then 
selected and form the final feature set D = [f^, ,.../^ } which are put into DE-EAM based classifier to 
realize the multi-fault recognition of the rolling element bearing. 

4. Experimentation and Validation 

4.1. Data Preparation 

To verify the effectiveness of the DE-EAM-based diagnosis method, experimental data of four 
types of bearing states (normal bearing, inner race fault, ball fault and outer race fault) under different 
operating loads (IHp, 2Hp, 3Hp) were collected. The experiment setup of test rig is shown in Figure 8. 
The deep groove ball bearing (type: 6205-2RS JEM SKF) was utilized to support the motor shaft and 
some parameters are given as: inside diameter: 0.9843 in; outside diameter: 2.0472 in; ball diameter: 
0.3126 in; pitch diameter: 1.537 in. An accelerometer was mounted at the driven end of the motor to 
collect vibration signals. These different faults were introduced using the electric discharge machining 
(EDM) method. The sampling frequency was set to be 12,000 Hz because the vibration sensory signal 
of the rolling bearing is mainly focused on this band. 



Figure 8. Experiment setup: (a) Picture of test rig; (b) Framework of test. 
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Each type of bearing was tested under three different loads (IHp, 2Hp, 3Hp) and 100 samples were 
collected, respectively. The length of each sample was selected as 2048 which cover more than five 
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cycles of shaft rotating periods. Finally, the number of samples corresponding to each bearing state is 
300 and the total number is 1200. Figure 9 shows the time domain waveforms of the four types of 
rolling bearing under three different loads. 

Figure 9. Time domain waveform of fault bearing vibration signal under different loads. 
(a)-(d) denote normal bearing, inner race fault, ball fault and outer fault, respectively. The 
subscript 1, 2 and 3 represent the IHp, 2Hp, 3Hp load, respectively. 
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The WPD is introduced to extract features from the original vibration samples. Here, the 
decomposed level u is selected as 4, which lead to that the dimensionality of the original feature being 
16 and a feature set F is formed with size 1200 x 16. The mRMR method is then applied to search a 
salient feature. Here, the target dimension k is chosen as 6. After mRMR based feature selection, the 
first six features \e\,eI, E]^' ,eI,eI,E\*^ are selected sequentially from F and constructed as the new 
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feature subset D whose size is 1200 x 6. Figure 10 shows the features spatial distribution of four kinds 
of states of rolling element bearings. It can be clearly seen that some feature data of different fault 
states overlap with each other, which increases the difficulty of the decision boundary representation. 
In addition. Figure 1 1 demonstrates the influence of load on the features' spatial distribution for every 
bearing state. It can be seen that the distribution scopes of the features under different loads are 
obviously different from each other, although their fault state is the same. In such a case, this casts 
more difficulty on depicting the spatial distribution of each class and judging the bearing states accurately. 

Figure 10. Features spatial distribution of four kinds of bearing states. 
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Figure 11. Features spatial distribution for each bearing state under three different loads: 
(a) normal; (b) inner race fault; (c) ball fault; (d) outer race fault. 
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4.2. Cross -Validation 

Cross-validation is a widely used scheme to evaluate the performance of a classifier. For v-fold 
cross-validation [40], all samples are randomly divided into v subsets with equal size and v iterations 
are performed according to the following rule: one different subset is used for validation while the 
remaining v-1 subsets are used for training to construct the classifier model. After v iterations, all 
results are averaged to produce the final estimation of the classifier. The major advantage of 
cross-validation is that all data are used for both training and validation, and each data is used for 
validation exactly once. In such case, the accuracy estimation owns a lower variance compared to an 
accuracy estimate using only one training and validation set. In this paper, 2-fold cross-validation is 
adopted to separate training samples D and evaluate the performance of the classifier. 



4.3. Parameters Optimization 

For the EAM classifier, four network parameters, vigilance parameter P , learning rate Y, effective 
diameter , and ratio of minor-to-major axes lengths ^ need to be optimized using the DE 
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optimization algorithm. The control parameters in DE are defined as follows: scaling factor F = 0.5, 
crossover rate C = 0.2 and the number of population Np = 40. In addition, the maximum generation 
number g^^^ is selected as 20. 

The classification error (Ce) and relative number of nodes ( R„„^ =ndl M ^ where nd is the number 
of EAM nodes, M is the dimensionality of samples) are the two most important indicators to evaluate 
the performance of the EAM classifier. Ce directly denotes the classifying ability, the smaller the error, 
the higher the ability, nd represents the computing speed, the less the nodes, the faster the network. In 
this paper, both Ce and Rnod are selected as the fitness function of DE optimization. Figure 12 shows 
the variation of the two fitness functions with the generation number. It can be seen that both fitness 
curves tend to be smooth after 12 generations. Therefore, the selection of g^^^ is suitable for 
guaranteeing the convergence of the DE algorithm. 



Figure 12. Variation of the fitness functions with the generation number. 
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After the maximum generation is satisfied, 40 populations are obtained in which each contains four 
EAM parameters. To get the final solution, a compound indicator O, is calculated for every possible 
solution and the final index / of the optimized parameters correspond to the minimum O, value, that is: 



mm 



. I cc'd 



) , «,„„;(') 



ae(l,...,A^J) 



(19) 



where Ce{t) and R„od(0 represent the classification error and relative number of nodes in rth 
population of the objective generation, respectively. After the index i is determined based on Equation 
(17), the final solution vector X.^ which contains four optimized parameters will be utilized to 
construct the EAM classifier. 



4.4. Classification and Analysis 

To evaluate the proposed DE-EAM classifier accurately, the order of feature samples is randomly 
rearranged ten times and a two-fold cross validation method is adopted simultaneously. Figure 13 
shows the averaging classification accuracy of the four types of bearing state under every case. In this 
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figure, the x axis denotes the repeat times and the y axis represents the classification accuracy. It can be 
obviously seen that the minimum accuracy is 94.75% and the maximum accuracy is 97.5%. The 
overall averaging classification accuracy can reach 96.1%. 

Figure 13. The averaging accuracy of the EAM classifier under different folds and 
rearrangement orders. 
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To further show the capability of the DE-EAM classifier, the accuracy of recognizing each kind of 
fault state is also calculated and listed in Table 1. It can be seen that for the normal bearing, the 
minimum accuracy is 98% and the maximum is 100%. For the recognizing of the ball pit fault, the 
lowest accuracy appears, although it can be seen that the average accuracy can still reach 92.9%. 
Therefore, it can be proven that DE-EAM shows excellent performance for recognizing the fault type 
of the rolling element bearings. 



Table 1. The classification accuracy of DE-EAM classifier for each bearing fault state. 
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5. Conclusions 

In this paper, a novel monitoring system combining an EAM network with a DE optimization 
algorithm is proposed to realize fault diagnosis of rolling element bearings. Within this framework, the 
original features are firstly extracted from the vibration sensory signals using WPD to characterize the 
low frequency and high frequency characteristics of the rolling element bearing. Moreover, the mRMR 
feature selection method is utilized to reduce the redundancy and irrelevancy of the features vectors. 
Based on the optimal selected features, the DE-EAM method is proposed to establish the corresponding 
classifier to realize the diagnosis of the rolling element bearings. The main advantage of the EAM 
classifier lies in that hyper-ellipsoids are adopted to represent the geometric shape of the feature spatial 
distribution and the smoothing operation algorithm is used to get a more accurate decision boundary. 
Moreover, the DE algorithm is integrated with EAM to obtain the optimal network parameters by the 
limited evolution operations. To verify the robustness and accuracy of the DE-EAM-based system, the 
samples under four kinds of bearing fault status are collected and organized by random order 
arrangement and two-fold cross validation strategy. Moreover, the EAM classifier is constructed based 
on the optimal network parameters which are calculated by the DE optimization. The classification 
results of four kinds of rolling element bearings show that the overall average accuracy is 96.1% and 
the maximum average accuracy for recognizing a single fault state can reach 99.8%, which 
demonstrates that the DE-EAM method can accurately realize fault diagnosis of rolling element bearings. 
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